evolving normalization-activation layer
Review for NeurIPS paper: Evolving Normalization-Activation Layers
Weaknesses: Lack of ablation study for the two rejection protocols is my mean concern and is the principle component of my rating. While the experiments focused intensively on various architectures and normalization-activation layers, it is not clear how those two rejection protocols contribute to the final results. Although both of them are very well motivated by the two observations, the observations themselve are not sufficient to justify the two rejection protocol. Evolution is extremely creative and the more constraint we manually put on it, the more we limit its creativity. More specifically, the search space for complex problems are usually very deceptive, for example, a candidate might be numerically unstable based on the stability criterion, however this candidate may have potential to be evolved into a surprisingly powerful one later on, but based on the current protocol it might be rejected early on. In table 3, random search with rejection also achieved very good results and authors' EvoNorm only outperformed it by a small margin, which also concerns me about the effectiveness of the search method itself.
Evolving Normalization-Activation Layers
Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods.
Review for NeurIPS paper: Evolving Normalization-Activation Layers
The paper focuses on designing new neural architectures; it presents a new search space and new optimization criteria. The new search space includes tensor-to-tensor operators integrating activation and normalization functions; the criteria involve an early performance indicator (this is classical) and a stability indicator (this is new). The rebuttal addressed nearly all reviewers' concern: * about the significance of the performance gains; * about the generality of the approach when applied to other architectures; * about the fair evaluation (with a hold-out); * about the impact of the stability indicator (lesion study). The AC would like the computational cost of the evolution to be spelled out in the revised paper (beyond "a relatively large number of CPUs" ..); how many tournaments? As a suggestion, it might be interesting to see whether (and how) scale insensitivity (E.2) could be used as a 3rd rejection criterion.
Evolving Normalization-Activation Layers
Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods.